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Abstract 

For any partition of {1, 2, ... ,n} we define its increments Xi, 1 < i < n 
by Xi — 1 if i is the smallest element in the partition block that contains 
it, Xi = otherwise. We prove that for partially exchangeable random 
partitions (where the probability of a partition depends only on its block 
sizes in order of appearance), the law of the increments uniquely deter- 
mines the law of the partition. One consequence is that the Chinese 
Restaurant Process CRP((9) (the partition with distribution given by the 
Ewens sampling formula with parameter 6) is the only exchangeable ran- 
dom partition with independent increments. 

1 Introduction 

Random partitions have been studied extensively during the past thirty years, 
and have found various applications in population biology, Bayesian statistics, 
combinatorics, and statistical physics; see [2] for an in-depth survey. Exchange- 
able random partitions were introduced by Kingman, motivated by applications 
in genetics. Partially exchangeable random partitions were introduced by Pit- 
man in pp. We recall their definition: 

Definition 1. Consider a random partition IT n = {A\, A2, ■ ■ ■ , A^} of [n] = 
{1,2, . . . ,71}, where the blocks Ai are listed in order of appearance (i.e. in in- 
creasing order of their smallest element). Il„ is called partially exchangeable 

if its probability only depends on the ordered block sizes: 

P(R n )=p(\A 1 \,...,\A k \) (1) 

for some function p taking values on the set of compositions of n (ordered sets 
of positive integers that sum up to n). H n is called exchangeable if p is sym- 
metric, so its probability depends on the sizes of the blocks but not their order. 
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A natural way to look at a partition is to construct it one element at a 
time. We start with a single block {1}, then the next element 2 either joins 
the existing block or starts a new one, and so on until n. For i = 1, . . . , n let 
Xj = 1 if i starts a new block, Xj = otherwise. (Alternatively, Xi = 1 if i is 
the smallest element in its block, Xj = otherwise.) We always define Xi = 1. 

Definition 2. Lei II n 6e a partition of {1,2, ... ,n}. Lef Xi = 1 if i is the 
smallest element in its block, otherwise. We call the sequence (Xi,...,X„) 
£/ie increments o/II n . 

Clearly the partition determines the increments, but not viceversa. The 
increments do not even determine the block sizes: the partitions {{1, 3, 4}, {2}} 
and {{1, 3}, {2, 4}} have the same increments X\ = 1, X2 = 1, X3 = 0, X4 = 0. 

For a random partition, the law of the partition induces a law for its in- 
crements on the probability space {0, 1}™. We prove that if the partition is 
partially exchangeable, the law of the increments does determine the law of the 
partition: 

Theorem 1. IfTl n is a partially exchangeable partition of[n], the distribution 
of its increments (Xi, . . . , X„) uniquely determines the distribution o/II„. 

Hence we obtain a correspondence between the set of partially exchangeable 
laws for partitions, and the set of laws for binary sequences of zeroes and ones. 
This correspondence is as close to a bijection as we could possibly hope for, 
in the following sense. There are exactly 2™ _1 compositions of n, so according 
to Q), the law of the partition is determined by the 2 n_1 values of p, which are 
arbitrary except for the constraint that P is a probability measure (so p > and 
a weighted sum of its values is equal to 1). There are 2™ _1 possible sequences 
of increments (since X\ — 1 always), so their law is also determined by 2™" 1 
numbers; the constraint on them turns out to be a system of linear inequalities 
(plus the obvious constraint that they sum up to 1). 

As an application of Theorem^ wc answer a question raised by Jim Pitman. 
First we define 

Definition 3. Let 9 > 0. The Chinese Restaurant Process with parameter 
9 is the exchangeable random partition of [n] with distribution given by 

k n— 1 

P (m, . . . , n k ) = e k Y[(m - 1)!/ JJ (p + i) (2) 

i=l i=0 

We denote it by CRP{9). 

Equation @ is equivalent to the well-known Ewens sampling formula, and 
the distribution of the block sizes of CRP(9) is also referred to as the Ewens 
partition structure. It has also been referred to as the Blackwell-MacQueen 
distribution. See 3 for a survey. The name "Chinese Restaurant Process" was 
introduced by Lester Dubins and Jim Pitman in the early 80's. 
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CRP{9) has several equivalent descriptions; for example, for 9 = 1 it is the 
partition induced by the cycles of an uniform random permutation. See for 
details. The description we are interested in is in terms of its increments Xi. It 
follows from © that those are independent and satisfy 

P(Xi = 1) = 9/{i -l + Q) (3) 

Hence CRP(9) admits a simple construction one element at a time: i starts 
a new block with probability 0/(i — 1 + ff) (and joins an existing block with 
probability proportional to the size of the block). It is easy to prove that this 
constructs an exchangeable partition. Jim Pitman asked whether this is the only 
exchangeable random partition with the property that Xi are independent. We 
prove that the answer is yes: 

Theorem 2. The Chinese Restaurant Process CRP(9) is the only exchangeable 
random partition with independent increments. 

Hence in this sense, the Chinese Restaurant Process is the simplest exchange- 
able random partition. 

Several other random partitions admit simple representations in terms of 
their increments Xi. If the increments are not independent, the next simplest 
case is to assume some kind of Markov structure. For example, we can require 
that the partial sums Si = X\ + . . . + Xi form a Markov chain. This is the same 
as requiring that the probability that i + 1 start a new block depend only on 

1 and on the number of already existing blocks Si. One process that satisfies 
this is Pitman's two-parameter generalization of CRP(#), described in pQ, 0. 
In this case we have 

P(X i+ i = l\Si = k) = {ka + 9)/(i + 9) 

where the parameters satisfy < a < 1, 9 > —a. For a = 0we obtain CRP(#). 
It is an open question to describe all random partitions for which Si is a Markov 
chain. 

2 Proof of Theorem [T] 

Let n„ — {A%, A2, ■ ■ ■ , Ak} be a partition of [n] with k blocks. We list the 
blocks Ai in order of appearance, so 1 € ii, the smallest element not in A\ is 
in A2, and so on. Let -B(II„) = (\A\\, . . . , \Ak\) the sizes of its blocks in order 
of appearance. 

Let (Xx, . . . , X n ) be the increments of the partition. Since there are k blocks, 
there are exactly k ones and n — k zeroes among the increments. We can encode 
such a binary sequence by the distance between consecutive l's. For example, 
the sequence 1, 0, 0, 1, 1, 0, 1, will be encoded 3, 1, 2, 2 as the distance between 
the first and the second 1 is three, the distance between the second and the 
third 1 is one, and so on. Formally, if ai is the smallest element in Ai, then we 
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know X ai = 1 for all 1 < i < k, so we encode the increments as the fc-tuple 
D(H n ) = (02 — 0-1,0,3 — a2, ...,ak — a, k —i,n + 1 — a k ). There is a bijection 
between such /e-tuples and binary sequences with X\ = 1, so from now on we 
will identify the sequence of increments with its encoding, and work directly 
with A:-tuples. 

Let S n ,k be the set of fe-tuples whose elements add up to n. We are interested 
in the relationship between B(H n ) and D(H n ); both are in S n>k . We define a 
partial order relation on S n>k as follows: (j/i, . . . , y n ) > (zi,...,z n ) iff y\ > 
zi, Vi + 2/2 > zi + z 2 , ■ ■ ■ , yi + V2 + ■ ■ ■ + Vn > z± + z 2 + ■ ■ ■ + z n . Then we have 

Lemma 3. For any partition H n , B(H n ) > D(H n ). 

Proof. Let S(n„) = (bi, . . . , b k ), D(Tl n ) = (di, . . . , d k ). Fix m with 1 < m < k. 
Clearly YhLi b i = IU™i^l and Y^=i d i = a m+i - «i = a m +i - 1, where 
a m+ i is the smallest element in A m+ i. Since the blocks Ai are listed in order 
of appearance, a m+ i is the smallest element outside |Ji =1 Ai, so it is at most 
I U™i A i\ + 1; equality occurs iff U™ 1 A* = {1, 2, ... , a m +i - !}■ □ 

Now consider a partially exchangeable law for II n , defined as in by a 
function p. This induces a law for the increments of fl n , and by using the 
encoding of binary sequences into fc-tuples discussed above, this induces a law 
on fc-tuples: 

q(d u ...,d k ) = P(D(n n ) = (d u ...,d k )) (4) 
Summing up over all partitions with the same block structure, we obtain 

q(di,...,d k ) = ^ p(&i,---,frfcMdi,...,d fc ;&i,... A-) ( 5 ) 

(6i,...,6 fc )es„, fc 

where r(oj, . . . , a k \ b\, . . . , b k ) denotes the number of partitions fl„ with blocks 
B(H n ) = (bi, . . . ,b k ) and increments encoded as D(H n ) = (di, . . . , d k ). 

This gives a system of linear equations in the g's and the p's; we will 
show it can be solved to compute p in terms of q. Consider the dictionary 
order on S n>k : (yi,...,y k ) > d (zi,...,z k ) iff yi > Z\ or y x = zi,...,y m = 
z m and y m +\ > z m +i for some m or yi = Zi for all i. It is easy to see 
that if (yi, . . . , y k ) > (z%, . . . , z k ) in the partial order previously defined, then 
(yi, . . . ,y k ) >d (zi, . . . ,z k ) in the dictionary order. Hence from Lemma 01 we 
obtain 

Lemma 4. For y, z G S n ,k> r (y; z) = unless z > y, and r(y; z) = I if z = y. 

Now S n>k is totally ordered under the dictionary order so we can arrange 
its elements in decreasing order. But then the lemma says that the matrix of 
the system of linear equations is triangular and its diagonal elements are all 1, 
so it is trivially invertible and p can be computed in terms of q. Explicitly, if 
the elements of S n , k are yi > d y 2 >d y 3 > • ■ then p(yi) = q(yi) and for 



4 



i > 2,p(yi) = q(yi) - Z)j=iP(yj) r (yi;yj)- This completes the proof of Theo- 
rem U □ 



The following result gives an explicit formula and a generating function for 
the coefficients r(-; •). 

Proposition 5. For (di, . . . , d k ) G S^fc and (&i, . . . , £>*) 6 S n ,k, let M be the 
set of k x k square matrices M — (my) with the following properties: 

(i) M is upper triangular, so TUij — if i > j . 

(ii) All other entries are non-negative integers, so mij > if i < j . 
(Hi) For all i, the i-th row sums up to bi — 1. 
(iv) For all i, the i-th column sums up to di — 1. 
Then 

r(d 1 ,...,d k ;b 1 ,...,b k )= E U^-^/U^y- ( 6 ) 

MEM i i,j 

Hence the following generating function identity holds: 

xf--\xi + x 2 ) d2 ~ 1 ...( Xl +x 2 + ... + x^- 1 = 

^2 r(di,...,d k ;b 1 ,...,bk)x b 1 1 ~ 1 x b 2 2 ~ 1 ...xl'*^ 1 (7) 

Proof. We need to count the number of partitions with block structure b and 
increment structure d. Let Ai be the blocks of such a partition, in order of 
appearance. Knowing d is equivalent to knowing the smallest element of each 
Ai, call them o^. Let D t = {a;, <ij + 1, . . . , Qi+i — 1} and let n,y = |A; P| Dj\. 
Of the elements in Dj, aj must belong to A.y, all others may be assigned to 
any Ai with i < j, and the number of ways in which this can be done is 
(<2j-l)!/(nij!n 2 j! . • . nj_ 1:j \(n n ■- 1 )!). If we let m lj = if j ^ j, m vi = n vi -l, 
then we obtain the desired formula. The generating function identity follows 
easily. □ 

3 Partitions with Independent Increments 

We now prove Theorem [21 

Proof. Let II„ be an exchangeable random partition of [n] such that its in- 
crements Xi, . . . , X n are independent. As above, let p be the function that 
describes its distribution as in Q, and q the joint distribution of the incre- 
ments. Consider sequences of increments with fc = n — 1; that is, there is only 
one zero among Xi, . . . , X n . If the zero is at the beginning (X 2 = 0) then the 
partition must be II n = {{1, 2}, {3}, . . . , {n}}. Hence 

g(l,0,l,...,l)=p(2,l,...,l). 
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If the zero is at the end (X n = 0) then n could be in any of the n — 1 
pre-existing blocks so there are n — 1 choices for II n . Since II„ is exchangeable, 
all these choices are equally likely. Hence 

g(l,l,l,...,l,0) = (n-l)p(2,l,...,l). 

But if we let u n = P(X n = 1), by independence we also have 

q(l,0, 1, . . . , 1) = - u 2 )u 3 ...u n 

and 

q(l, 1, 1, . . . , 1,0) = uxu 2 ■ ■ .Un-i(l - Un)- 
If u 2 7^ 0, 1 then it follows easily by induction that u n ^ for all n and hence 

(n - 1)(1 - u 2 )/u 2 = (1 - u n )/u n 

so if we let 9 = u 2 /(l — u 2 ) then u n — 9/(n — 1 + 9). Hence the increments 
of the process have the same law as the increments of CRP(6>), so the process 
must be CRP(0). 

It remains to consider the cases when u 2 is or 1. li u 2 = 1 then by 
induction u n = 1 so all blocks are singletons (this is CRP(0) in the limit case 
9 = oo). If u% = then all u n = (n„ cannot contain the singleton {1} so by 
exchangeability it cannot contain the singleton {n} either) so there is only one 
block (CRP(0) for 9 = 0). □ 



4 Another Binary Representation 

Theorem[2allows us to obtain a partially exchangeable partition from any law on 
random binary sequences satisfying certain constraints. While we have proved 
the theorem for partitions of the finite set [n] , the result is easily extended to 
infinite partitions. We obtain thus a correspondence between the set of distri- 
butions of partially exchangeable partitions of N, and the set of distributions 
of infinite binary sequences. By Theorem ^ the correspondence is one-to-one; 
it is not onto. 

There is another way to associate infinite binary sequences to partitions, 
which is discussed in detail in [I], Chapter 4. Given a binary sequence, the 
problem is to construct an exchangeable random partition II, so that the distri- 
bution of the (unordered) block sizes of its restriction H n to [n] is the same as 
the distribution of the "gaps" (distances) between consecutive l's in the binary 
sequence. More precisely: 

Definition 4. Let Yi,Y 2 ,... be a random infinite sequence with Y\ = 1 and 
Y n G {0,1} Vn, and let II be an exchangeable random partition o/N. Let 1 = 
m < n 2 < . . . be the locations of the 1 's in the sequence Y n . For any fixed n, 
let nk = max{z : i < n, = 1}. Then 

n = (n 2 - m) + . . . + (n k - n k+ i) + (n + 1 - n k ) 
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is a partition of n into k integers. If this has the same distribution as the 
partition of n induced by restricting II to [n], then we say that IT has a gap 
representation by Y. 

The two binary representations are related, but not identical. In particular, 
while any random partition has a representation by increments, it is not known 
under what conditions a random partition admits a gap representation. Also, 
note that the gap representation is interesting only for infinite partitions and 
sequences; for fixed n, any law for the sequence easily translates into a law for an 
exchangeable partition, as we are free to specify to probabilities for all possible 
block sizes. The problem is whether these laws are compatible as n varies. 

In U, gap representations are constructed for various partitions, including 
CRP((9), for which the gap representation has P(Y n = 1) = &/{n — 1 + 0) and 
the Y n are independent. It is also proven that the only exchangeable random 
partition which admits a gap representation via a sequence Y of independent 
binary random variables is CRP(0). The similarity with Theorem [21 may seem 
surprising, but it is explained by the following result: 

Proposition 6. Suppose II has a gap representation by Y , and let X be the 
increments of II. Then 

X 1 + ...+X n =Y 1 + ... + Y n , Vn>l (8) 

Proof. Both sides of the identity are equal in distribution to the number of 
blocks in II„. □ 

If X and Y are each sequences of independent variables, then (JHJ) implies they 
have the same distribution (in fact, it is enough to assume that the sequences 
of partial sums X\ + . . . + X n and Y\ + . . . + Y n are (possibly non-homogenous) 
Markov chains). 

Acknowledgments. Many thanks to Jim Pitman for suggesting the prob- 
lem and for useful discussions. Thanks also to Noam Berger. 
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